arXiv:1507.08076vl [cs.CV] 29Jul2015 


Cross-pose Face Recognition by Canonical Correlation Analysis 


Annan Li a , Shiguang Shan b , Xilin Chen b , Bingpeng Ma c , Shuicheng Yan a , Wen Gao d 

a Department of Electrical and Computer Engineering, National University of Singapore, Singapore. 
b Institute of Computing Technology, Chinese Academy of Sciences, Beijing,China 
c University of Chinese Academy of Sciences, Beijing, China 
d Institute of Digital Media, Peking University, Beijing, China 


Abstract 

The pose problem is one of the bottlenecks in automatic face recognition. We argue that one of the difficulties in this problem is 
the severe misalignment in face images or feature vectors with different poses. In this paper, we propose that this problem can be 
statistically solved or at least mitigated by maximizing the intra-subject across-pose correlations via canonical correlation analysis 
(CCA). In our method, based on the data set with coupled face images of the same identities and across two different poses, CCA 
learns simultaneously two linear transforms, each for one pose. In the transformed subspace, the intra-subject correlations between 
the different poses are maximized, which implies pose-invariance or pose-robustness is achieved. The experimental results show 
that our approach could considerably improve the recognition performance. And if further enhanced with holistic+local feature 
representation, the performance could be comparable to the state-of-the-art. 

Keywords: Face Recognition, Canonical Correlation Analysis, Face Recognition Across Pose. 


1. Introduction 

Automatic face recognition is a classical research topic in 
computer vision and pattern recognition research. After more 
than 30 years of research, the performances of face recognition 
systems have been greatly improved. In some evaluation tests 
computer based face recognition systems even outperform hu¬ 
mans m. However, the high performance is usually achieved 
under controlled imaging conditions. Usually, it means that the 
face images are acquired under frontal view, normal expression 
and mild illumination. These requirements are often unreal¬ 
istic in real-world applications, since the variations of pose, 
illumination and expression are very common and uncontrol¬ 
lable. When these variations are present, the performance of 
face recognition often drops significantly j2j. Thus, the prob¬ 
lem of face recognition is far from being solved. In above- 
mentioned challenging problems, the pose problem is one of the 
most important and difficult issues for face recognition. In this 
paper we present a novel pose robust face recognition approach 
that can significantly improve the recognition performance. 

The variations of appearance caused by pose differences in 
2D face images are related to two factors, i.e., the viewpoint 
and the 3D facial shape. The 3D shape of human face has a 
complex structure. Thus, in the 2D image taken from a differ¬ 
ent viewpoint, the locations of surface points on face change 
differently. That is to say, a pair of points close in one pose 
may become far away from each other in another pose. This 
inner structure distortion in image makes the alignment very 
difficult. Furthermore, the 3D shape of human face is not ide¬ 
ally convex. Its concavity area leads to occlusion. That is, when 
pose changes, some visible parts on face may become invisible 
while some invisible parts may become visible. In the vector 
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Figure 1: Pose variations lead to misalignment and noise in the feature vectors. 
The face images under frontal view and 45° are generated by a 3D face model. 
The corresponding face regions are given by the ground truth 3D shape and il¬ 
lustrated in the same color. The feature vectors are misaligned. And the feature 
vector of non-frontal face even contains non-face elements. 


based approaches, the elements in the feature vector are sam¬ 
pled continuously with equal step on the 2D face image or on 
the output of some filters on the face image, such as the Ga¬ 
bor filters 0. However, when pose difference is big the inner 
distortion and occlusion mentioned above lead to misalignment 
and noise in the feature vectors. As shown in Figure 1, if the el¬ 
ements are sampled in the same way on frontal and non-frontal 
faces, the feature vectors are misaligned, and even contain some 
non-face elements. It is not surprising that the performance of 
face recognition would be poor using such feature vectors. It 
also explains a special phenomenon in vector based cross-pose 
face recognition that the distance between two faces of different 
people under similar viewpoint is often smaller than that of the 
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same person under different viewpoints (see Figure [2]). Viewed 
from this point, the key problem for pose-robust face recogni¬ 
tion is how to measure the similarity of identity between two 
misaligned and noisy feature vectors. 



Figure 2: In vector based approaches, faces in similar pose is more similar than 
those in different pose even for the faces of the same identity. 


Aligning the elements in feature vectors is equal to recon¬ 
struct the fine 3D shape of face. Because of the complexity of 
3D face, misalignment is a very challenging problem. However, 
it does not mean that the problem is insoluble. Although the 
human face varies from person to person, the variations caused 
by pose difference in 2D face images have particular statistical 
character in common. In a specific pose, the faces of a given 
person can be well represented by a linear subspace. The diffi¬ 
culty of the pose problem is that the subspace of different pose 
is different due to the misalignment mentioned above. However, 
these different subspaces can be aligned using the face pairs 
coupled by identity 013, which is shown in Figure 3. So, the 
problem of recognizing faces across poses can be formulated as 
how to measure the similarity between two vectors from differ¬ 
ent subspaces. We proposed that this theoretical problem can 
be elegantly solved by using the Canonical Correlation Analy¬ 
sis (CCA) method 0. The CCA can maximize the correlations 
between two different sets of variables. Thus, if performing 
CCA on the face pairs coupled by identity across two poses, 
the intra-subject correlations between two poses can be maxi¬ 
mized. Consequently the problem caused by misalignment is 
statistically avoided or mitigated. 

Based on the above analysis, we propose a novel approach 
for recognizing face across poses. In this approach, two sets 
of pose specific basis vectors are learned simultaneously based 
on the coupled face data. Feature vectors from different poses 
are projected onto the basis vectors respectively, and then face 
recognition across pose differences is performed by matching 
these projections. The basic idea of this approach is illustrated 
in Figure 3. The CCA was first applied to tackle the pose prob¬ 
lem in our preliminary work JT), in which it is mainly used as 
a patch matching and virtual view synthesis method. Differ¬ 
ent from it, in this paper, we extend the proposed CCA based 
recognition approach by adopting facial feature representation 
method that integrates holistic features and local Gabor fea¬ 
tures. Based on this feature representation method, multiple 
classifiers are built via CCA and combined together to enhance 
the recognition performance. 

The remaining parts of this paper are organized as follow. 
Section [2] gives a brief review of cross-pose face recognition, 
and Section [3] describes our method in detail. Subsequently, 
Section[4]presents the experimental results and lastly, we draw 
the conclusions and discuss the future works in Section[3 


2. Related Works 

Robustness to pose change is a challenging and classical 
problem in the research of face recognition. Related literature 
survey can be found in IHlUjO. When multiple face images un¬ 
der different view are available, the difficulty of the pose prob¬ 
lem is much reduced. But in real world applications such re¬ 
quirement is not always feasible. In this paper, we only con¬ 
cern the basic and most challenging scenario: recognizing face 
across pose difference using single query image. According to 
the main contribution, related research works can be roughly 
categorized into three classes, i.e., the geometrical approaches, 
the statistical approaches and the hybrid approaches respec¬ 
tively. 

As described in previous section, misalignment is the key 
factor that leads to performance degradation when pose varia¬ 
tions are present. So, pursuing geometric alignment of face is 
a natural way to tackle the pose problem. Beymer and Poggio 
GED used single example 2D image to predict non-frontal faces 
from frontal faces. In this approach, dense pixel to pixel align¬ 
ment is obtained from a pair of 2D example images. In some 
sense, pursuing the precise pixel to pixel alignment across dif¬ 
ferent viewpoint is equivalent to reconstructing the 3D shape of 
face. When 3D face data is available, many approaches of re¬ 
covering 3D facial shape from 2D image are proposed. Among 
them, the 3D morphable model proposed by Blanz and Vet¬ 
ter 031 is considered the state of the art. By fitting the sta¬ 
tistical 3D model to the input face, high recognition rate can 
be achieved using the representation coefficients or the trans¬ 
formed images El. Although the optimization process of fit¬ 
ting guarantees the reconstruction accuracy, it brings the prob¬ 
lem of high computational complexity. An extension of this 
work with spherical harmonic illumination model can be found 
in BCD. Recently, Asthana et al. M use active appearance 
models (AAMs) 03 to detect facial landmarks and reconstmct 
3D facial shape based on the landmarks. Cross-pose face recog¬ 
nition is performed by matching synthesized face using local bi¬ 
nary patterns lfl6l . Mostafa et al. IfTTI proposed a similar work 
in which active shape models (ASMs) CGD are utilized for facial 
landmark detection. 

Although methods like 3D morphable model achieves good 
performance, accurate 3D face reconstruction is still a compli¬ 
cated and difficult problem. Prabhu et al. fl9l argued that depth 
information of faces may be not significantly discriminative for 
modeling 2D pose variability. Thus, if one could obtain sev¬ 
eral such generic canonical depth maps for different input face 
groups, such as race, age, and gender, then face images under 
different poses can be easily rendered. Consequently, they pro¬ 
posed a 3D Generic Elastic Models for generating new views 
of template face. Face recognition across pose is performed by 
matching the input faces with the rendered faces. Besides the 
work of 03, Castillo and Jacobs Eol ED also proposed a sim¬ 
plified geometrical approach for cross-pose face recognition. 
They simplified the 3D shape of face as a cylinder and recog¬ 
nized face through stereo matching. To address the problem of 
large pose difference, an improvement with surface slant of this 
approach was proposed recently E3 . Since it does not need 
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to perform 3D reconstruction, recognizing face across pose via 
stereo matching is simple and effective. 

Different from the dense point-to-point alignment ap¬ 
proaches mentioned above, sparse alignment approaches only 
use several facial landmarks, such as the eye corners and nose 
tips etc. These landmarks are usually salient in appearance. 
Thus, the difficulty of alignment and the computational cost 
could be greatly reduced. Wiskott et al. 1231 proposed the 
Elastic Bunch Graph Matching method, in which several fidu¬ 
cial points are elastically matched using local Gabor feature. 
The AAMs Q]D is similar to the 3D morphable model. A ma¬ 
jor difference between them is the shape model: in AAMs, the 
morphable model is simplified to some facial landmarks in 2D 
space. Some multi-view extension of AAMs can be found in 

M HU 126). 

Besides the geometrical approaches, building statistical mod¬ 
els is another popular way to recognize faces across poses. 
Hitherto, a typical statistical approach is the eigen light-held 
method proposed by Gross et al. Il27l . They built a com¬ 
plete appearance model including all possible pose variations. 
A test image can be viewed as a part of this complete model. 
The missing parts are estimated from the available parts. The 
recognition is performed by comparing the coefficients of the 
complete appearance model. The tied factor analysis method 
proposed by Prince et al. l28l is another typical statistical ap¬ 
proach. In this method some tied factors across pose difference 
are learned using Expectation Maximization algorithm. Then, 
face recognition is performed based on the probabilistic dis¬ 
tance metric that built on the factor loadings. Besides principal 
component analysis and factor analysis, Li et al. 1(29) Sharma 
and Jacobs ll30ll applied partial least squares in cross-pose face 
recognition. They used partial least squares to learn a pair of 
projection matrix for two different poses, and cross-pose face 
recognition is performed by comparing the “intermediate cor¬ 
related projections”. 

Besides building the pose-robust statistical models, statisti¬ 
cally transforming face or features from one pose to another is 
another way to tackle the pose problem. Sanderson et al. ED 
transformed the frontal face model to non-frontal views for ex¬ 
tending the gallery set. Lee and Kim Il32l constructed feature 
spaces for each pose using kernel principal component analysis, 
and then transformed the non-frontal face to frontal through the 
feature spaces. Different from the foregoing two methods that 
transform holistic faces, Chai et al. If33l performed linear re¬ 
gression on local patches for virtual frontal view synthesis. Li et 
al. Il34l embedded bias-variance trade-off in the cross-pose lin¬ 
ear regression models by using ridge and lasso regression. Such 
bias-variance trade-off achieved considerable improvements in 
recognition performance. Choi et al. |35l applied null space 
linear discriminant analysis in their face recognition approach 
dealing with both pose and illumination variations. 

The geometrical alignment can directly reduce the pose dif¬ 
ference, but the alignment itself cannot deal with the occlu¬ 
sion. Statistical approaches can mitigate the occlusion problem 
to some extent, their performance highly relies on the training 
data. Thus, combining geometrical and statistical information 
is an alternative way to tackle the pose problem. We term this 


kind of methods the hybrid approach. One way to integrate 
geometrical and statistical information is combining local sta¬ 
tistical models with coarse geometrical alignment. Kanade and 
Yamada l36l proposed a probabilistic framework to build and 
combined local statistical models for recognizing faces under 
different viewpoints, in which the statistical models are built 
on local patches on face images. Based on this framework, 
Liu and Chen ED used a simple 3D ellipsoid model to align 
patches across different pose, while Ashraf et al. f38l aligned 
the patches by learning 2D affine transform for each patch pair 
via a Lucas-Kanade l39l like optimization procedure. Unlike 
the above methods that concern on matching local patches, 
Lucey and Chen |40) extended Kanade and Yamada’s work by 
building similar statistical models between holistic non-frontal 
faces and local patches on frontal face. 

Besides combining local statistical models with coarse geo¬ 
metrical alignment, another way to integrate geometrical and 
statistical information is embedding geometrical information 
into statistical models. Specifically, in this kind of methods, the 
combination is accomplished by augmenting the feature vec¬ 
tor that representing a local region with their spatial locations. 
And a face is represented as a set of augmented feature vectors. 
Based on this face representation, Zhao and Gao Pffl sampled 
the augmented feature vectors through key points detection, and 
used modified Hausdorff distance to measure the similarity be¬ 
tween two faces. Wright and Hua ll42l densely sampled the 
augmented feature vectors on face, and quantified them into 
histograms via random projection trees. Lace recognition is 
then performed by matching the histograms. Benefiting from 
the dense sampling and quantization, the matching is spatially 
elastic to some extent. Thus, the problem of alignment is im¬ 
plicitly alleviated. 

The method proposed in this paper is a statistical approach. 
Base on the foregoing analysis, it is difficult to get good recog¬ 
nition performance for a pure statistical approach, especially 
when pose difference is big. Therefore, we extended our 
method by using some facial landmarks. Similar statistical 
models are built on the region centered at these landmarks. So, 
the enhanced recognition method is a hybrid approach that in¬ 
tegrates local statistical models with coarse geometrical align¬ 
ment. 

3. Recognizing Face Across Pose via Canonical Correlation 

Analysis 

The problem of recognizing faces with pose difference can 
be formulated as measuring the similarity between two vec¬ 
tors from different linear subspaces. We propose that it can be 
solved by Canonical Correlation Analysis. In this section, we 
describe the CCA in detail and how it leads to pose-invariance 
or pose-robustness by learning from the coupled face data. In 
this section, the CCA based recognition approach is also en¬ 
hanced by integrating holistic and local facial features. 

3.1. Recognizing Face Across Pose via CCA 

Proposed by Hotelling in (6), canonical correlation analysis 
is a classical technique in statistical learning. It has been widely 
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used in pattern recognition, and in recent years it has also been 
applied to face analysis. Sun and Chen l43l l44ll modified the 
CCA model with soft label and local preserving projection re¬ 
spectively. They applied their methods for frontal face recog¬ 
nition. Ma et al. j45l improved the CCA model by maximiz¬ 
ing the differences between the within class correlations and 
between class correlations. Their method is also applied for 
frontal face recognition. Kim et al. l46l proposed a similar im¬ 
provement for the CCA model, but applied their model for face 
video analysis. Zheng et al. ED used kernel CCA for recogniz¬ 
ing facial expression. Reiter et al. ff48l and Lei et al. H9\ used 
CCA to reconstruct 3D facial shape. Yang et al. It50l applied 
CCA to 2D-3D face matching. 


of variables onto them are mutually maximized. Denote a pair 
of basis vectors as ( w x , w y ). The correlation p between the pro¬ 
jections w T x X and w y Y is 


P = 


E[w T x XY T Wy \ 


JE[w T x XX T w x \E[w T y YY T w y ] 
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Here, E\f(x,y)] is the empirical expectation of function f(x,y). 

Considering the means of X and Y are zero, the total covari¬ 
ance matrix of ( X , Y) can be written as: 
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Figure 3: The illustration of our approach. In the training phase two sets of 
basis vectors are learned via canonical correlation analysis using the coupled 
face data. In the testing phase faces are projected onto the pose specific basis 
vectors. Recognition is performed by comparing these projections. 


“Canonical correlation analysis can be seen as the problem 
of finding basis vectors for two sets of variables such that the 
correlation between the projections of the variables onto these 
basis vectors are mutually maximized.’ ’ED. As illustrated in 
Figure [3] if the faces from two different poses are coupled by 
identity and performing CCA on these face pairs, the correla¬ 
tions between the same identities are maximized. Therefore, 
this intra-subject correlation is robust to pose difference. 

Let (X, Y) be the coupled training set of faces from two differ¬ 
ent poses, where X = [x\,X 2 ... ,x„], Y = \y\,y 2 ■ ■ ■ ,y„]. Each 
face image is represented by a feature vector, and n is the num¬ 
ber of image/vector pairs. Both X and Y are normalized to be 
of zero mean. Our goal is to find two sets of basis vectors, each 
for one pose, such that the correlations between the projections 


where C xx and C yy are the within-pose covariance matrices of X 
and Y respectively and C xy = C yx is the within-subject covari¬ 
ance matrix between two different poses. Thus, the objective 
function maximizing the correlations can be described as: 


, , W x C„Wy 

(w x , Wy) = arg max - 

{WX ’ Wy) yjw T x C XX W X Wy CyyWy 


(3) 


The solution of w x and w y can be found by solving the fol¬ 
lowing eigenvalue equations [5_2 |: 

C XX CxyCyy Cy x W x — p-W x 
Cyy Cy X C xx C X yWy — p Wy . 

Only one of the equations needs to be solved, because the 
solutions are related by 


where 


C X yWy - PA X C XX W X 

Cy X W X = pAyCyyWy , 



wf CyyWy 

w T x C xx w x ' 


(5) 

( 6 ) 


Eigen-decomposition brings a set of orthogonal eigenvec¬ 
tors. Therefore, we have multiple eigenvector pairs like 
(w x , Wy). Denote them as W x = [w x \, w X 2 .. ., w x k\ and W v = 
[w y \,Wy 2 ... ,w y k\. Here k is the number of eigenvectors in 
both sets. Once the optimized W x and W y are obtained, face 
recognition across poses can be performed by measuring the 
intra-subject correlations. For a pair of gallery and probe faces 
{xi n p U t, yt„p U t), firstly we project them onto their corresponding 
basis vectors: 

X — VTy (Xmput ~~ Xmean) (7) 

S’ — b y (yinput ~ ymean ) ■ 

The similarity between the gallery and probe face is measured 
by the correlation between the projections x and y. 



In the recognition, we simply use the nearest neighbor classi¬ 
fier. A probe face is identified to the gallery face with highest 
correlation value. 
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Similar to the Fisher’s linear discriminant analysis, singular¬ 
ity problem is also exist in the CCA method, for the covariance 
matrix C xx and C yy could be not invertible. There are two meth¬ 
ods to solve this problem. In the first, covariance matrix can be 
regularized by adding a small value to the diagonal elements. 
Denote the small value as a, we have: 


C* xx = C xx + al 

Cyy = Cyy + al. 


(9) 


By replacing C xx and C yy by the regularized covariance ma¬ 
trix C* xx and C* y in Equation |4| the singularity problem could 
be avoid. The second methodis to perform principal compo¬ 
nent analysis (PCA) before CCA, which is similar to the fish- 
erfaces method (53ll . Besides solving the singularity problem, 
PCA also reduces the dimension of the data. Thus, the compu¬ 
tational cost in the training phase is also reduced. If the training 
data is adequate, performing PCA before CCA seems a better 
way to solve the singularity problem. However, the available 
multi-pose face data is not adequate enough. For example, in 
the PIE database ID there are only 68 subjects. After PCA, the 
reduced dimension is too low to represent the facial appearance 
variations. Therefore, in this paper we choose the covariance 
matrix regularization for solving the singularity problem. 

The learning results of CCA are two sets of eigenvectors. 
Similar to the eigenfaces (54), we can display these coupled 
eigenvectors as images. We name these images as corrfaces. 
In Figure [4] we presented first 20 corrfaces learned from the 
CAS-PEAL database with 30° pose difference. Different from 
the eigenfaces, the corrfaces are coupled pairs. We can see that 
in each pair of corrfaces, the ghost faces belong to the same 
“ghost”. The eigenvectors W x and W y can be viewed as two 
coordinate systems that reflect the same identity information in 
different pose specific subspaces. Projecting the faces of differ¬ 
ent pose onto W x and W y respectively, the influence of identity 
is emphasized while the influence of pose difference is reduced. 
In Figure [5] we demonstrate the learning results on CAS-PEAL 
database by comparing the histograms of the correlation values. 


In Figure[5](a), we present the histogram of the correlations be¬ 
tween original feature vectors. Projecting the original feature 
vectors onto the corrfaces, the histogram of their correlations 
is shown in Figure [5] (b). We can see that the in Figure [5] (b) 
the correlation distributions of the same and different identity 
are better separated. The influence of pose misalignment in the 
feature vectors is considerably reduced. 


0° vs. 45° - Original Feature 0° vs. 45° - CCA 



-1 -0.5 0 0.5 1 Original CCA PLS 

Correlation (d) 

(c) 


Figure 5: Histograms of correlations before and after CCA/PLS modeling cal¬ 
culated on CAS-PEAL database across 45° pose difference. In the original fea¬ 
ture space (a), correlations between the same and different persons are mixed 
together. After CCA and PLS modeling (b,c), the histogram centers (mean 
value) of the correlations between same the people (the dashed vertical line) 
move to the right which means that the cross-pose intra-individual correlations 
are enhanced. The difference between CCA and PLS modeling is that correla¬ 
tions after by CCA modeling are of lower variance (d), which leads to better 
separation of the histograms. 
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Table 1: The matrices A and B for PCA, PLS and CCA. 


Model 

A 

B 

PCA 

Cxx 

I 

PLS 


' 0 c„' 

C-y^ 0 > 


(/ Of 
lo I) 

CCA 


' 0 Cxy 

Cyx 0 _ 


(Cxx 0 \ 

0 cj 


3.2. Comparisons with Partial Least Squares 

Partial Least Squares (PLS) is similar to CCA in modeling 
correlations between two different sets. The relations between 
them have been well studied. In this work, we compare them 
by following the formulation in l52l . The eigen-problems in 
Equation [4] can be formulated as a single eigenvalue equation: 

B~ l Aw = pw, (10) 


where 


A = 


0 C, 
yC yx 0 


,B = 


Cxx 
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c v 


, w ■ 
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Equation[lO]provides a unified framework of solving PCA, PLS 
and CCA by different matrices A and B, which is described in 
Table □ 

As can be seen, the difference between PLS and CCA is the 
choice of matrix B, which corresponds to the normalization de¬ 
nominator in Equation[3] In another word, the normalization on 
correlations is the key difference between PLS and CCA mod¬ 
eling. In Figure [5] we illustrate the results of PLS and CCA 
modeling obtained from CAS-PEAL database with 45° pose 
difference. After PLS and CCA modeling, the distribution cen¬ 
ters (dashed vertical lines) of intra-individual correlations both 
move to the right, which means their values are maximized. 
This phenomenon reflects the influence of matrix A and the nu¬ 
merator in Equation[3] which is the same for PLS and CCA. As 
shown in Figure [5] (d), the results of CCA modeling are lower 
in variance, which demonstrates the influence of matrix B and 
the normalization denominator in Equation [3] Because of the 
lower variance, CCA separates the histograms better than PLS 
does. It shows that the variance of intra-individual correlations 
is high in cross-pose face recognition, and the normalization in 
CCA reduces the variance and gives an extra contribution to the 
improvement of performance. 


3.3. Enhancement with holistic+local feature representation 

The pixel intensity based feature vector is frequently used 
in the face recognition literature. However, its representation 
power is limited. The 2D Gabor features were proved success¬ 
ful and widely used in face recognition If23l (3l. But if the Gabor 
features are sampled from the whole face, such as the approach 
of Liu and Wechsler 0, the pose misalignment in feature vec¬ 
tors would counteract the benefit of the Gabor features. Since 
the Gabor wavelet is a 2D texture descriptor, while pose vari¬ 
ations are tightly related to the 3D shape of human face. Con¬ 
voluted by the Gabor kernel, the geometrical distortion in the 


feature vectors becomes even worse than that in the pixel inten¬ 
sity based feature vectors. This problem could be avoided by 
using coarse geometrical alignment. Prince et al. l28l extract 
local Gabor features on some local regions centered at some 
corresponding facial landmarks, such as the center of eye, the 
corner of mouth etc. This feature representation approach con¬ 
siderably improves the recognition performance. To explore the 
potentiality of the proposed method, we adopte similar feature 
representation method. One difference is that besides the local 
Gabor features, the holistic features are also used in our method. 
The local Gabor features mainly reflect the local information of 
face. Su et al. m proved that the holistic features are also 
important for representing faces. In their work, classifiers are 
built on the holistic features and local Gabor features respec¬ 
tively. By combining the holistic and local classifiers, the per¬ 
formance of frontal face recognition is dramatically improved. 
In this paper we also use the holistic+local strategy for fea¬ 
ture enhancement. As shown in Figure [6] the holistic intensity 
features are sampled on the whole face region, while the local 
Gabor magnitude features of 5 scale and 8 orientations are ex¬ 
tracted on the patches centered at some facial landmarks, such 
as the eye centers, the corner of mouth etc. It should be pointed 
out that the enhancement in feature representation benefits from 
both the coarse alignment and the Gabor features. Enhanced by 
this feature representation method our approach becomes a hy¬ 
brid approach that integrates coarse geometric alignment and 
local statistical models. 



of 5 scale and 8 orientations. 
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Holistic features 

yd 


Figure 6: The holistic and local facial feature representation. The holistic fea¬ 
tures are sampled from holistic face region. The local Gabor magnitude features 
of 5 scale and 8 orientations are extracted from local region centered at facial 
landmarks, such as the eye centers. 


Based on the approach described above, we have several in¬ 
dependent models for representing the faces. Besides the holis¬ 
tic model, we have several local models each local model of 
which corresponds to a facial landmark. There are two differ¬ 
ent ways to utilize these models, i.e., concatenating these mod¬ 
els into a holistic feature vector and using them independently. 
In this paper we choose to use these models independently for 
reducing the computational cost. Consequently, a CCA based 
classifiers is built independently on each holistic model and lo¬ 
cal model. Denote the correlation given by the i-th classifier 
as c;, the final decision is derived from the mean correlation 
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calculated by the following equation: 
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( 12 ) 


4. Experiments 

In order to validate the proposed approach of pose-robust 
face recognition, we performed experiments on three databases, 
i.e., the CMU PIE g), the Multi-PIE (56) and the CAS-PEAL 
database 0 respectively. In the PIE database there are 68 peo¬ 
ple whose images are captured in 13 poses with yaw and pitch 
angle differences. The yaw angle difference between neighbor 
poses is about 22.5°. The Multi-PIE database is an extended 
version of the PIE. It contains more subjects. In this database 
face images are captured under 15 viewpoints in four record¬ 
ing sessions. In our experiments, we use face data from session 
1, which contains 249 subjects. The interval of neighbor yaw 
angle is about 15°. In the CAS-PEAL database, the subject is 
asked to look upward, downward and horizontally for capturing 
faces under different pitch angles. In our experiments we use 
the face images under horizontal pitch angle, which consists of 
face images of 938 subjects under 7 different yaw angles. The 
yaw angle difference between neighbor poses is about 15°. 



Figure 7: The facial landmarks used in the experiments. 


Five landmarks are manually labeled on each face in our ex¬ 
periments, i.e. the eye centers, the tip of nose and the corners 
of mouth respectively. The face image is normalized using the 
eye centers and mouth corners. The scale of face is normalized 
according to the distance between the center of mouth and the 
midpoint of two eye centers. All the face images are normalized 
to 204 x 256. The example of facial landmarks and normalized 
faces are illustrated in Figure [7] 

We performed three series of experiments. In experiment 1, 
the proposed method is evaluated on CAS-PEAL and Multi- 
PIE database. Although CAS-PEAL and Multi-PIE are bet¬ 
ter benchmark for cross-pose face recognition, previous ap¬ 
proaches are not tested on them. Therefore, we compare the 
proposed method with related approaches on PIE in experiment 
2. In experiments 1 and 2, the pose angle of gallery and probe 
faces is assumed to be known. To evaluate the robustness to 
inaccurate pose estimation, we perform experiments with un¬ 
known probe pose in experiment 3. 


local Gabor feature and the combination of them. In these ex¬ 
periment the normalized holistic face region (the rectangle in 
Figure]?]) is re-sized to 45 x 56. Thus, the length of the holistic 
feature vector is 2520. Some examples of normalized holistic 
face region are shown in Figure [8] Although holistic intensity 
feature is frequently used in the face recognition research, its 
representation power is limited. To evaluate the potentiality of 
the proposed method, we align the local face region with some 
facial landmarks and used Gabor filters for features extraction. 
As shown in Figure 8, we use five facial landmarks, i.e. two 
centers of eyes, two corners of mouth and the tip of nose re¬ 
spectively. These landmarks are the most salient points on face. 
Unlike these points on the silhouette or close to the hair, they 
are robust to occlusion and easy to detect. The Gabor magni¬ 
tude features of 5 scales and 8 orientations are sampled from a 
31x31 window centered at each landmark. To reduce the di¬ 
mension, the window is down sampled to 7 x 7. Thus, the total 
length of Gabor feature vector is 1960. 

For experiments on Multi-PIE database, we used 100 sub¬ 
jects for training and 149 subjects for testing. In the experi¬ 
ments on CAS-PEAL database, 200 subjects are used for train¬ 
ing and the remaining 738 subjects are used for testing. We find 
that the bigger number of basis vectors the better performance 
CCA achieves. Therefore, subtracting the one dimension of the 
data centering (zero mean), we set the number of basis vec¬ 
tors to 99 and 199 for Multi-PIE and CAS-PEAL respectivel\f] 
Moreover, the regularization parameter a in Equation[9]is set to 
10 ( ' in all experiments. 

For the holistic faces, we build a CCA based classifier. And 
for local features, a CCA based classifier is built independently 
on each pair of local regions centered at the corresponding land¬ 
marks. Consequently, we have 1 holistic classifier and 5 local 
classifiers in total. The final classification decision is obtained 
by integrating these classifiers. 

We plot the matrix of recognition results (all poses against all 
poses) with holistic and local features in Figure [9] The exper¬ 
imental results on CAS-PEAL with holistic feature, local fea¬ 
ture and holistic+local features are given in Figure 0(a), (b) 
and (c) respectively. Corresponding average recognition rates 
are 85.73%, 97.49% and 98.72%. In Figure 10 (d), (e) and (f) 
we illustrate the results on Multi-PIE. The average recognition 
rates with holistic feature, local feature and the fusion of them 
are 73.31%, 82.96% and 91.31% respectively. 

To illustrate the relationship between performance and pose 
difference, we show the performance comparison under frontal 
gallery pose in Figure 10 As can be seen, combining 5 lo¬ 
cal classifiers could get much higher performance than single 
holistic classifier. The CAS-PEAL database contains the largest 
number of subjects in multi-pose databases. It is impressive 
that, when the galley pose is frontal, over 99% recognition rates 
are achieved in all probe poses on the CAS-PEAL database. As 
we analyzed in Section [T] the pose difference makes the holis¬ 
tic feature vector noisy and misaligned. Compared with holistic 


4.1. Experiment 1: Comparisons on CAS-PEAL and Multi-PIE 

In this experiment, we test the proposed method with three 
types of visual features, i.e. the holistic intensity feature, the 


1 We simply use maximum dimensions for preserving energy as more as 
possible. For the training data is centered by subtracting the mean face, the 
maximum dimension is w-1, where n is the number of training samples. 
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Figure 8: Examples of normalized holistic face region of Multi-PIE (top row), CAS-PEAL (middle row) and PIE (bottom row) database. 


feature aligned local features are more robust to pose variations. 
Thus, representing face locally could improve the performance. 
However it does not mean that the holistic feature is useless. 
When pose difference is big the alignment becomes inaccurate. 
By contrast, in this scenario holistic features are more robust 
than the local features. Our experimental results show that com¬ 
bining holistic features could improve the performance in cases 
with large pose differences. Since the pose differences in the 
experiments on Multi-PIE are larger, the performance enhance¬ 
ment is clearer than that on PIE and CAS-PEAL. From Figure 
[TO} we can see that the larger the pose difference is, the more 
performance enhancement we can get. Integrating holistic fea¬ 
tures can considerably improve the performance for cross-pose 
face recognition. 

In Figure[lO] we also compared the proposed cross-pose face 
recognition method with partial least squares (PLS) |29], ridge 
regression l34l and the tied factor analysis (TFA) 12810 In the 
experiments, the number of tied factors is set to 32. We can see 
that CCA outperforms PLS and ridge regression using the same 
visual features. As described in Section [3~2l CC A is better than 
PLS in controlling the variance of intra-individual correlations. 
As a result, it archives better performance than PLS does. Both 
using holistic features, CCA outperforms TFA on CAS-PEAL. 
Among the 14 probe poses in Multi-PIE, CCA and TFA achieve 
the same accuracy in 1 pose, and TFA is better in 8 poses, while 
CCA outperforms TFA in 5 poses. 

4.2. Experiment 2: Comparisons on PIE 

The Multi-PIE and CAS-PEAL database are recently re¬ 
leased. Compared with PIE they contain more subjects. Espe¬ 
cially in our experiments on CAS-PEAL database 738 subjects 
are used for testing, which is a much bigger number than in 
previous experiments. The large-scale of data sets makes ex¬ 
perimental results more convincing. However, the experiments 
in previous works of cross-pose face recognition are not per¬ 
formed on Multi-PIE and CAS-PEAL. For comparison to pre¬ 
vious works, we also conduct experiments on the PIE database. 


Table 2: Performance comparison on CMU PIE database. 


Work 

Pose diff. (°) 

Rec. rate (%) 

Blanz and Vetter [[111 

front / side / profile 

99.8/97.8/79.5 

Prince et al. 1281 

22.5 / 90 

100/91 

Chai et al. [33] 

22.5/45 

98.5/89.7 

Kanade and Yamada 1361 

22.5 / 45 / 90 

100/ 100/47 

Gross et al. 1271 (ELF Complex) 

22.5 / 45 / 90 

93 / 88 / 39 

Castillo and Jacobs (21] (4ptSMD) 

22.5 / 45 / 90 

100/97/62 

Sharma and Jacobs 1301 

22.5 / 45 / 90 

100/88/79 

Asthana et al. 1141 

22.5/45 

100/98.5 

Mostafa et al. 1171 

22.5/45 

100/95.6 

Our method 

22.5 / 45 / 90 

100/ 100/85.29 


In the PIE database there are 68 people whose images are 
captured in 13 poses with yaw and pitch angle differences. The 
yaw angle difference of neighbor pose is about 22.5°. In the 
experiments one half of the data are used for training while 
the remaining part is used for testing. In this experiment the 
gallery and probe pose are known. The feature representation 
and parameter settings are the same with that in experiments on 
Multi-PIE and CAS-PEAL. 

The experimental results are given in the bottom row of Fig¬ 
ure [9] The average recognition rates with holistic feature, local 
feature and the fusion of them are 77.46%, 91.98% and 95.28%. 
In Figure 1C ’ we show the experimental results using frontal 


gallery face with comparisons of Eigen-light field proposed by 
Gross et al. ED and the multi-subregion matching method 
proposed by Kanade and Yamada l36l . All the three methods 
shown in Figure[lO]use 34 subjects for testing. Thus the com¬ 
parison is relatively fair. If we only use the holistic intensity 
features, our method outperforms the Eigen-light field in most 
viewpoints, which also represents face holistically. The perfor¬ 
mance of our method with holistic features is lower than the 
Kanade and Yamada’s approach, which is a local patch based 
approach. But when local Gabor features are used, the perfor¬ 
mance of our method is better. 

To our knowledge, the 3D morphable model (3DMM) pro¬ 
posed by Blanz and Vetter liTTI and the tied factor analysis 
(TFA) proposed by Prince et al. 1281 are the state of the art 
in cross-pose face recognition. For comparing the results re¬ 
ported in (28], empirical comparisons in three probe views 


2 The implementation of TFA is based on the codes provided in http:// 
web4.cs.ucl.ac.uk/staff/s.prince/TiedFactorAnalysis.zip 


3 In Figure [To| we directly cite the experimental results reported in 12711361 . 
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Figure 10: Experimental results of cross-pose face recognition using frontal 
gallery faces and non-frontal probe faces on CAS-PEAL,PIE and Multi-PIE 
data sets. 


(c05,cl l,c22) on PIE database are given in Table (The 
gallery pose is frontal). Besides the 3DMM and TFA, other 
notable results on PIE are also illustrated. From the table, we 
can see that TFA achieves the best results, the performance of 
our method using holistic+local features is between 3DMM and 
TFA. It should be pointed out that the comparisons are empiri¬ 
cal for experiment settings and feature representation methods 
are different. In the experiments of TFA and our method, only 
half of the subjects are used in testing. In the experiments of 
3DMM all the subjects are included in the testing set. The num¬ 
ber of facial landmarks used in the experiments influences the 
power of feature representation. In our experiments we use only 
5 landmarks, while 6-8 and 14 manually labeled landmarks are 
used in the experiments of 3DMM and TFA respectively. It 
could be expected that the performance of our method could be 
further enhanced by using more facial landmarks. Empirically 
speaking, the proposed method is close to the state of the art. 

4.3. Experiment 3: When probe pose is unknown 

The probe pose is assumed known in experiment 1 and 2. 
To evaluate the robustness to inaccurate head pose estimation 
of the proposed method, we conduct experiments on the CAS- 
PEAL database with unknown probe pose. Similar to the exper¬ 
iment 1 and 2, face images of 200 subjects are used for train¬ 
ing, while the remaining 738 subjects are used for testing. The 


4 The experimental results in Table[2]except our method are cited from cor¬ 
responding references. 


gallery pose is frontal. The probe pose is non-frontal and un¬ 
known. Therefore, there are six possible poses in total. The ex¬ 
perimental results with holistic, local and “holistic+local” fea¬ 
ture representation are plotted into 6x6 “confusion matrices” in 
Table [3] [4] and [^respectively. In these tables “EP” denotes the 
estimated probe poses and “RP” denotes the real probe poses. 
In the experiments the facial landmarks are also manually la¬ 
beled. 

As can be seen, the local feature representation are more ro¬ 
bust to inaccurate pose estimation. Besides the alignment and 
feature representation, the recognition performance is also af¬ 
fected by the difference between gallery pose and real probe 
pose. The smaller the difference the higher the performance is. 
In Figure [IT] we illustrate the average performance declination 
when the differences between the estimated pose and the real 
pose are ±15°,±30° and ±45°. We can see that it is still diffi¬ 
cult for recognizing faces across pose when the error of pose 
estimation is big. When the error of pose estimation is about 
15° the average decrease in performance is 2.50% based on 
“holistic+local” feature representation. We can conclude that 
the proposed method is robust under this condition. From re¬ 
cent survey ED, achieving error less than 15° is not difficult 
for pose estimation technique. 


Table 3: The confusion matrix of recognition rate based on holistic features. 


RP 

EP^^^ 

-45° 

-30” 

-15° 

15° 

30” 

45° 

-45” 

88.75% 

52.57% 

36.72% 

22.76% 

12.46% 

7.31% 

-30” 

66.80% 

94.17% 

85.36% 

45.79% 

17.88% 

9.07% 

-15” 

31.30% 

70.32% 

94.44% 

51.62% 

21.40% 

7.31% 

15” 

20.86% 

52.84% 

87.12% 

100% 

57.18% 

22.08% 

30” 

16.53% 

36.58% 

67.88% 

87.66% 

90.51% 

40.92% 

45” 

11.11% 

18.56% 

26.01% 

43.22% 

54.60% 

75.88% 


Table 4: The confusion matrix of recognition rate based on local features. 


RP 

EP^^^ 

-45° 

-30” 

-15° 

15” 

30” 

45” 

-45” 

100% 

99.18% 

98.50% 

49.45% 

18.97% 

13.55% 

-30” 

92.41% 

99.86% 

100% 

69.91% 

49.05% 

31.57% 

-15” 

85.77% 

99.05% 

100% 

95.12% 

75.33% 

45.25% 

15” 

38.61% 

64.36% 

91.86% 

99.86% 

98.23% 

77.64% 

30” 

29.67% 

46.47% 

79.40% 

99.86% 

100% 

94.85% 

45” 

18.83% 

33.33% 

65.31% 

99.05% 

99.05% 

99.45% 


Table 5: The confusion matrix of recognition rate based on “holistic+local” 
feature representation. 


RP 

EP^^^ 

-45° 

© 

CO 

1 

-15° 

15° 

30” 

45° 

-45” 

100% 

99.45% 

98.50% 

60.97% 

25.74% 

16.12% 

-30” 

96.34% 

99.86% 

100% 

78.86% 

51.62% 

32.65% 

-15” 

84.41% 

99.18% 

100% 

94.03% 

71.40% 

43.22% 

15” 

38.88% 

67.20% 

95.52% 

100% 

97.83% 

74.66% 

30” 

40.78% 

67.61% 

91.73% 

100% 

100% 

92.41% 

45° 

34.95% 

50.54% 

79.53% 

99.59% 

99.59% 

99.72% 


5. Conclusions and future works 

In this paper we proposed a novel approach for recognizing 
faces across different poses. We showed that it is the misalign¬ 
ment in feature vectors that makes the cross pose face recogni- 


9 

































































Pose Difference(°) 


Figure 11: The average performance declination. 


tion very difficult. However, by learning via canonical correla¬ 
tion analysis from the face pairs coupled by the same identities, 
the intra-subject correlations could be maximized across differ¬ 
ent poses. Thus, the problem of matching misaligned vectors is 
statistically avoided. We conducted experiments on the largest 
and latest multi-pose data sets. The experimental results show 
that the proposed approach is very effective. Our experiments 
also show that integrating local and holistic features can further 
improve the recognition performance, especially when the pose 
difference is large and thus the information of local appearance 
is unreliable because of the occlusion. In this situation, combin¬ 
ing holistic appearance could gets more enhancement in perfor¬ 
mance. 

As a classical statistical learning method, CCA has been im¬ 
proved since it was proposed in 1936. So, as one of our future 
work, these improved models can be adopted to improve the 
proposed pose-robust face recognition approach. Additionally, 
the accurately aligned facial landmarks play an important role 
in the proposed method. In real world application it could be 
an unstable factor. This problem will be studied by introducing 
elastic matching technique, which has been proved effective for 
real world face recognition in recent years. 
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Figure 9: Experimental results on CAS-PEAL (top row), Multi-PIE (middle row) and PIE (bottom row) database with holistic (a,d,g), local (b.e.h) and holistic+local 
(c,f,i) feature representation. 


12 





